Clustering compositional data trajectories

نویسندگان

  • Francesca Bruno
  • Fedele Greco
چکیده

This work is motivated by the following question: given a sample of compositional data trajectories (i.e. sequences of composition measurements along a domain), how can one propose a segmentation procedure leading to homogeneous classes? In other words, our contribution aims at studying statistical methods suited for clustering compositional data, when the observations are constituted by trajectories of compositional data. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methodologies suited for clustering of trajectories are known as Functional Cluster Analysis (FCA) (Ramsay and Silverman, 2005). However, FCA techniques have not been extended to analyse compositional data trajectories. To this aim, FCA clustering techniques have to be adapted by using a suitable algebra for compositions (Aitchison, 1986). In this work, we propose a methodology consisting in a preliminary smoothing of compositional trajectories, followed by the construction of suitable metrics needed for both partitional and hierarchical clustering. A simulation study is performed in order to check the proposed methodologies. The quality of the obtained results is assessed by means of several indices (Halkidi et al., 2001). Finally, an environmental application is developed. The methodologies are applied to a real dataset containing measurements of particulate matter vertical profile compositions for different days. The aim of the application is to detect typical behaviours (clusters) characterizing the vertical profiles of particulate matter compositions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Measuring the Similarity of Trajectories Using Fuzzy Theory

In recent years, with the advancement of positioning systems, access to a large amount of movement data is provided. Among the methods of discovering knowledge from this type of data is to measure the similarity of trajectories resulting from the movement of objects. Similarity measurement has also been used in other data mining methods such as classification and clustering and is currently, an...

متن کامل

Discovering motion hierarchies via tree-structured coding of trajectories

The dynamic content of physical scenes is largely compositional, that is, the movements of the objects and of their parts are hierarchically organised and relate through composition along this hierarchy. This structure also prevails in the apparent 2D motion that a video captures. Accessing this visual motion hierarchy is important to get a better understanding of dynamic scenes and is useful f...

متن کامل

Incremental Clustering for Trajectories

Trajectory clustering has played a crucial role in data analysis since it reveals underlying trends of moving objects. Due to their sequential nature, trajectory data are often received incrementally, e.g., continuous new points reported by GPS system. However, since existing trajectory clustering algorithms are developed for static datasets, they are not suitable for incremental clustering wit...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008